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Abstract — We study sequential coding of Markov sources 
under an error propagation constraint. An encoder sequentially 
compresses a sequence of vector-sources that are spatially i.i.d. 
but temporally correlated according to a first-order Markov 
process. The channel erases up to B packets in a single burst, 
but reveals all other packets to the destination. The destination is 
required to reproduce all the source-vectors instantaneously and 
in a lossless manner, except those sequences that occur in an error 
propagation window of length B + W following the start of the 
erasure burst. We define the rate-recovery function R{B, W) — 
the minimum achievable compression rate per source sample in 
this framework — and develop upper and lower bounds on this 
function. Our upper bound is obtained using a random binning 
technique, whereas our lower bound is obtained by drawing 
connections to multi-terminal source coding. Our upper and 
lower bounds coincide, yielding R{B, W), in some special cases. 
More generally, both the upper and lower bounds equal the rate 
for predictive coding plus a term that decreases as y^rfj, thus 
establishing a scaling behaviour of the rate-recovery function. 

For a special class of semi-deterministic Markov sources we 
propose a new optimal coding scheme: prospicient coding. An 
extension of this coding technique to Gaussian sources is also 
developed. For the class of symmetric Markov sources and 
memoryless encoders, we establish the optimality of random 
binning. When the destination is required to reproduce each 
source sequence with a fixed delay and when W = we also 
establish the optimality of binning. 

Index Terms — Streaming source coding, Rate-distortion The- 
ory, Sequential coding. Source coding. Video coding. 

I. Introduction 

TRade-off between compression efficiency and error re- 
silience is fundamental to any video compression system. 
In live video streaming, an encoder observes a sequence of 
correlated video frames and produces a compressed bit-stream 
that is transmitted to the destination. If the underlying channel 
is an ideal bit-pipe, it is well known that predictive coding 1 1] 
achieves the optimum compression rate. Unfortunately packet 
losses are unavoidable in many emerging video distribution 
systems with stringent delay constraints. Predictive coding 
is highly sensitive to such packet losses and can lead to a 
significant amount of error propagation. In practice various 
mechanisms have been engineered to prevent such losses. 
For example video codecs use a group of picture (GOP) 
architecture, where intra-frames are periodically inserted to 
limit the effect of error propagation. Forward error correction 
codes can also be applied to compressed bit-streams to re- 
cover any missing packets lH, Q. Modifications to predictive 
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coding, such as leaky-DPCM H, ||5], have been proposed in 
the literature to deal with packet losses. The robustness of 
distributed video coding techniques in such situations has been 
studied in e.g., E, Q. 

Information theoretic analysis of video coding has received 
significant attention in recent times, see e.g., |8|-|10| and the 
references therein. These works focus primarily on the source 
coding aspects of video. The source process is modeled as 
a sequence of vectors, each of which is spatially i.i.d. and 
temporally correlated. The encoder is generally restricted to be 
either causal or having a limited look-ahead. The destination is 
required to output each source vectors in a sequential manner. 
However all of these works assume an ideal channel with no 
packet losses. To our knowledge, even the effect of a single 
isolated packet loss is not fully understood ifTTI . 

In this work, we study a fundamental trade-off between 
error propagation and compression rate in sequential source 
coding when the channel introduces packet losses. The encoder 
compresses the source-vector sequence in a causal manner and 
the receiver is required to recover each source sequence in 
an instantaneous and lossless manner. The channel introduces 
a burst of B erasures and the destination is not required 
to recover B + W source sequences following the start of 
the erasure burst. We introduce the rate-recovery function 
R{B, W) — the minimum achievable compression rate in 
this framework. Upper and lower bounds on this function are 
developed. The upper bound is obtained using a binning based 
scheme. The lower bound is obtained by drawing connections 
to a multi-terminal source coding problem. Conditions under 
which the upper and lower bounds coincide are discussed. In 
particular we establish that the rate-recovery function equals 
the predictive coding rate plus a term that decreases as ^^:p[, 
where W is the length of the error propagation window. 

We study special class of sources for which the binning 
based upper bound can be improved by exploiting the under- 
lying structure. First we consider the linear semi-deterministic 
Markov source and develop a new coding technique — 
prospicient coding that meets the lower bound for all values 
of B and W. In our proposed scheme, we first transform 
the linear semi-deterministic source into a simpler diagonally- 
correlated source. For the latter class, we provide an explicit 
coding scheme that meets the lower bound. We also extend the 
proposed coding technique to an i.i.d. Gaussian source process, 
where the receiver is required to recover source sequences in 
a sliding window of fixed length. Numerical results indicate 
significant improvements of the proposed coding scheme over 
techniques such as FEC based coding and naive binning. 

For the class of symmetric sources, with an additional as- 
sumption of a memoryless encoder, we establish the optimality 
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Fig. 1. Problem Setup: The encoder output fj is a function of the source sequences up to time j i.e.. Sq ,s", . . . ,s". The channel introduces an erasure 
burst of length B. The decoder produces s" upon observing the sequence g^. As indicated decoder is not required to produce those source sequences that 
fall in a window of length B + W following the start of an erasure burst. 



of a binning based technique. This is done by establishing a 
connection with another muhi-terminal source coding problem 
— the Zig-Zag source network with side information lfT2l . For 
our streaming problem, we need to only lower bound the sum- 
rate for this network, which we do by exploiting the symmetric 
nature of the underlying sources. 

As another extension, we consider the case when the de- 
coder is allowed a fixed decoding delay of T frames. When 
= we again establish the optimality of binning. For the 
converse, we introduce a periodic erasure channel of period 
B + T + 1, where the first B packets are erased. We argue 
that the decoder can recover each of the remaining source 
sequences by their deadline and invoke the source coding 
theorem to find a lower bound on the rate-recovery function. 

The remainder of the paper is organized as follows. The 
problem setup is described in Section HI] and a summary of 
the main results is provided in Section Our upper and 
lower bounds on the rate-recovery function are established 
in section |IV] The prospicient coding scheme is described 
for the class of diagonally correlated deterministic sources in 
Section [V] for the linear deterministic sources in Section |VI] 
and for Gaussian sources in Section IVIII The optimality of 
binning for symmetric sources is established in Section IVIIII 
whereas the case of delay-constrained decoder is treated in 
section |IX] Conclusions are provided in section |X] 

II. Problem Statement 

In this section we describe the source and channel models 
as well as our notion of an error-propagation window and the 
associated rate-recovery function. 

A. Source Model 

We consider a semi-infinite stationary vector source process 
{s"}t>o whos^H symbols (defined over some finite alphabet S) 
are drawn independently across the spatial dimension and from 

'in estabUshing our coding theorems we assume that the source process 
starts at t = — 1 (or before if required) and that all the sequences with a 
negative index are revealed to the destination. We will also assume that the 
transmission terminates after a sufficiently long period. 



a first-order Markov chain across the temporal dimension, i.e., 
for each t > 1, 

Vr( c" — c" I c" — s" <;" — 

riy — hf. I bf_i — Sf_i, bf._2 — Sf_2i • • ■) 

n 

= '[[Psi\soist,j\st-ij), Vi > 1. (1) 

We assume that the underlying random variables {st}f>o con- 
stitute a time-invariant, stationary and a first-order stationary 
Markov chain with a common marginal distribution denoted 
by ps(-). Such models are used in earlier works on sequential 
source coding. See e.g., [9J for some justification. We remark 
that the results for the lossless recovery also generalize when 
the source sequence is a stationary process (not necessarily 
i.i.d. ) in the spatial dimension. However the extension to 
higher order Markov process appears non-trivial. 

B. Channel Model 

The channel introduces an erasure burst of size B, i.e. for 
some particular j > 0, it introduces an erasure burst such that 
5i = * for z e {j,j + 1, j + B — 1} and gi — fi otherwise 
i.e., 

_ f*, te[j,j + i,...,j + B-i] 



C. Rate-Recovery Function 

A rate-i? causal encoder maps the sequence {sJ''}2>o to a,n 
index fi G [1,2"^^] according to some function 

n^J^,{4,...,s'l) (3) 

for each i > 0. For most of our discussion we will assume 
causal encoders. Furthermore, a memoryless encoder satisfies 
J-i (sg , s") — J^i{s2) i.e., the encoder does not use the 
knowledge of the past sequences. Naturally a memoryless 
encoder is very restricted and we will only use it to establish 
some special results. 
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Upon observing the sequence {gi}i>o the decoder is re- 
quired to perfectly recover all the source sequences using 
decoding functions 

sr = e.(.go,5i,---,.90, i(^{j,...,J + B + W-l}. (4) 

where j denotes the time at which the erasure burst starts 
in (|2|l. It is however not required to produce the source 
sequences in the window of length B + W following the start 
of an erasure burst. We call this period the error propagation 
window. The setup is shown in Fig. [T] 

A rate R{B, W) is feasible if there exists a sequence of 
encoding and decoding functions and a sequence e„ that 
approaches zero as n oo such that, Pr(s" ^ s"/') < e„ 
for all i ^ {j, + B + W — 1}. We seek the minimum 
feasible rate R{B, W), which we define to be the rate-recovery 
function. 

III. Main Results 
In this section we discuss the main results of this paper. 

A. Upper and Lower Bounds 

Theorem 1. For any stationary first-order Markov source 
process the rate- recovery function satisfies R~{B^W) < 
R{B,W) < R+{B,W) where 

R+{B,W)=H{s,\so) + ,^^^I{sB ; sb+i\so), (5) 



R-iB,W)=H{si\so) 



1 



W + 1 



I{sb; sw+B+i\so)- (6) 



Notice that the upper and lower bound coincide for W ~ 
and W oo, yielding the rate-recovery function in these 
cases. More generally we can interpret the term H{si\sf)) as 
the rate associated with ideal predictive coding in absence 
of any erasures. Theorem [T] suggests that the rate-recovery 
function equals H{si\so) plus a term that decreases as pp^. 

The upper bound is obtained using a binning based scheme. 
At each time the encoding function /, in Q is the bin-index 
of a Slepian-Wolf codebook |[T3l . Following an erasure burst 
in [j: j + B — 1], the decoder collects fj+s, ■ ■ ■ , fj+w+B and 
attempts to jointly recover all the underlying sources at < = 
j -\- W + B. The following corollary provides an alternate 
expression for the achievable rate and makes the connection 
to the binning technique more explicit. 

Corollary 1. For any first order Markov source process 
defined in Section \II-A\ the upper bound in Q can also be 
expressed as 



1 



-H{sb+i,sb+2i ■ ■ ■ , Sb+w+i\so)- (7) 



The proof of Corollary [T] is provided in Appendix |A] 
Although our framework assumes a single isolated erasure 
burst, we note that the coding scheme enables recovery in the 
presence of multiple erasure bursts, provided there is a guard 
interval of at least + 1 between these bursts. 

Our lower bound involves two key ideas that we illustrate 
below for the case when W — 1 and 5 = 1. First we develop 
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Fig. 2. A Multi-terminal Source Coding Problem related to the proposed 
streaming setup. The erasure at time t = 3fc leads to two virtual decoders 
with different side information as shown. 



the following equivalent expression of (|6]l which is easier to 
interpret: 

R-{B = l,W = 1) = if(si|so) + i/(si;s3|so) (8) 



= H{si\so) + iiJ(s3|so) - iiJ(s3|so,Si) 



= ^-^"(51,52150) + ii7(s3|so) 
= \h{si\so,S2) + ii/(s3|so) 



\h{S3\5i) 



(9) 
(10) 
(11) 



where both ( fTOb and ( fTTT i follow from the first-order Markov 
Chain property Sq — > Si — S2 — ?► S3. 

Our first idea is to introduce a periodic erasure channel 
where every third packet gets erased i.e., gk = for t = 
3k, k = 0,1,2,.... We claim that even on such a channel 
every third source sequence must be recovered. Suppose the 
destination does not receive /o but observes gi — fi and 52 = 
/2. It must recover S2 al t — 2. At this point, because of the 
Markov nature of the source process, it becomes synchronized 
with the encoder i.e., the effect of earlier erasures is no longer 
relevant. Thus it treats the new erasure at time t = 3 as the 
only erasure it has observed so far Upon receiving fi and 
/s it must recover s" at i = 5. More generally, it is able 
to recover 53^,^2 at i = 3fc + 2 upon sequentially observing 
{/3i+i, /3j+2}o<i<fe and missing {/3,}o<i<fc- From the source 
coding theorem we must have 

2fci? > H{f,j2, fi, /5, . . . , /3fe-2, /3fe-l) (12) 
>i/(s^\s5",...,S3Vl) (13) 

>n(fc-l)i/(s3|so) (14) 

which, upon taking k 00 yields R > ^H{s3\so). 

The above argument only takes into account one constraint 

— when there is an erasure, the destination needs to recover 
the source sequences with W = I. Hence it is missing the 
term of ^H{si\so, S2) that appears in (fTTT i. To recover this 
term, we need to take into account for the second constraint 

— in absence of erasures the destination must recover each 
source sequence instantaneously. 

Our second key idea is to introduce a multi-terminal 
source coding problem with one encoder and two decoders 
that simultaneously captures both these constraints. This is 
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illustrated in Fig. |2] The encoder is revealed {s^k+i' ^3k+2) 
and produces outputs fsk+i and f3k+2- Decoder 1 needs to 
recover Sg^^j^ given fsk+i and sj^, while decoder 2 needs to 
recover sl\^^ given S3"^_;^ and (/sfe+i, /3fe+2)- Thus decoder 1 
corresponds to the steady state behaviour of the system when 
there is no loss while decoder 2 corresponds to the recovery 
immediately after an erasure. For the above multi- terminal 
problem, we establish a simple lower bound on the symmetric 
rate R = ^H{hk-i) = ^H{hk) as follows: 

2nR>H{hk+ij3k+2) 

>H{hk+i,hk+2\sSk-i) (15) 

= H{f3k+1, /3fe+2, Sgfe+aks/c-l) 

- H{s'^k+2\hk+l, f3k+2, S^^k-l) (16) 
>iJ(/3fc+l,S3V2k3Vl)-«^n (17) 

> H{s^k+2\^3k-l) + H{f3k+l\s^k+2^ ^3k^l) - 

> -ff(S3fc+2|s3fc-l)+^^(/3fc+l|S3fe+2>S3fe,S3j._i) - ne„ 



(18) 



>F(s7,+2|s3Vl)+i?(s3Vlk: 



'3fc+2 1 ^3fc ' ^3fe 



(19) 

> iJ(s3V2|s?fe-i)+i?(sJfc+i|sJfc+2, s?,) - 2ne„ (20) 
= niJ(s3|so) + nH{si\s2,SQ) - 2ne„ (21) 

where ( fTTI i follows from the fact that S3^._|_2 must be recovered 
from (/3fc+i, /3fc+2, S3j,_]^) at decoder 2 hence Fano's inequal- 
ity applies and (fTsT l follows from the fact that conditioning 
reduces entropy. ( fT9] l follows from Fano's inequality applied 
to decoder 1 and finally (|20| | follows from the Markov chain 
associated with the source process. Dividing throughout by 
n in (I2TI 1 and taking n — 00 recover the desired lower 
bound ( fTTT l. 

To apply the above lower bound in the streaming setup, 
we need to take into account that the decoders have access 
to codeword indices rather than side-information sequences. 
Furthermore the encoder has access to all the past source 
sequences. The formal proof of the lower bound is presented 
in Sec. |IV] While inspired by the above ideas, it is somewhat 
more direct. 

B. Linear Semi-Deterministic Markov Sources 

We propose a special class of source models — linear semi- 
deterministic Markov sources — for which the lower bound 
in ^ is tight. Our proposed coding scheme is most natural 
for a subclass of deterministic sources defined below. 

Definition 1. (Linear Diagonally Correlated Deterministic 
Sources) The alphabet of a linear diagonally correlated de- 
terministic source consists of K sub-symbols i.e.. 

Si = (sj,o, ■ • ■ ,Sj,K) e So X Si X ... X Sk, (22) 

where each Si = {0,1}^' is a binary sequence. Suppose 
that the sub-sequence {s^ o}i>o o.n i.i.d. sequence sampled 
uniformly over Sq and for 1 < j < K, the sub-symbol Sij is 



a linear deterministic functio^ o/Si_i j_i 



^AU multiplication is over the binaiy field. 



l<j<K. 



(23) 



for fixed matrices Ri.o, R2.1 • • • , R-K.x-i each of full row- 
rank i.e., rank(Rjj_i) — Nj. 

For such a class of sources we establish that the lower bound 
in Theorem [T] is tight and the binning based scheme is sub- 
optimal. 

Proposition 1. For the class of Linea^ Diagonally Correlated 
Deterministic Sources in Def. [7] the rate-recovery function is 
given by: 



R{B,W) = R-{B,W) 

= ff(Si|So) + 

1 



1 



-I{sb\sb 



+ W+1 



Nn 



W + 1 



W + 1 

min{K-W,B} 

Nvi^+k ■ 

k=l 



|so) 



(24) 
(25) 



Sec. |V] provides the proof of Prop. [T] Our coding scheme 
exploits the special structure of such sources and achieves 
a rate that is strictly lower than the binning based scheme. 
We call this technique prospicient coding because it exploits 
non-causal knowledge of some future symbols. We make the 
following remark, which will be established in the sequel. 

Remark 1. In the proof of the coding theorem for Prop. |7] it 
suffices to consider the case when K — B + W. The extension 
to the case when K < B + W is trivial and the extension to 
the case when K > B + W also follows in a straightforward 
manner 

The proposed coding scheme can also be generalized to a 
broader class of semi-deterministic sources. 

Definition 2. (Linear Semi-Deterministic Sources) The alpha- 
bet of a linear semi-deterministic sourc^ consists of two sub- 
symbols i.e.. 



Si — (si^ojSi^i) e iSo X iSi, 



(26) 



where each Si — {0, 1}^' for i = 0, 1. The sequence {s^^o} 
is an i.i.d. sequence sampled uniformly over Sq whereas 



[A B] 



Si- 1,1 



(27) 



for some fixed matrices A and B. 



We show that through a suitable linear transform, that is 
both invertible and memoryless, this apparently more general 
source model can be transformed into a diagonally correlated 
deterministic Markov source. The propsicient coding can be 
applied to this class. 

^The assumption of linearity in Def. [T] is not required to achieve the 
lower bound. However we use lineaiity to generalize to the class of semi- 
deterministic sources in Thm.[2] 

*Since each sub-symbol is a (fixed length) binary sequence we use the 
bold-face font s^ j to represent it. Similarly since each source symbol is a 
collection of sub-symbols we use a bold-face font to represent it. This should 
not be confused with a length n source sequence at time i, which will be 
represented as s". 
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Theorem 2. For the class of Linear Semi-Deterministic 
Sources in Def. |2] the rate-recovery function is given by: 



R{B,W) = R-^{B,W) 
= i?(si|so) 4 



1 



-/(si3;SB+w+i|so). (28) 



W + 1 

The proof of Theorem |2] is provided in Sec. IVII 

C. Gaussian Sources 

Our proposed framework can be easily extended to a con- 
tinuous valued source process with a fidelity measure. While a 
complete treatment of the lossy case is beyond the scope of the 
present paper, we study one natural extension of the diagonally 
correlated source model in Def. [T] to Gaussian sources. 

Consider an Gaussian source process that is i.i.d. both in 
temporal and spatial dimensions, i.e., at time i, a sequence 
consisting of n symbols s", is sampled i.i.d. according to a 
zero mean unit variance Gaussian distribution iV(0, 1). 

The encoder's output at time i is denoted by the index = 
J"(sJ, . . . , sf ) G [1, 2"-^] as before. At time i, upon receivmg 
the channel outputs until time i, the decoder is interested in 
reproducing a collection of past K sources. 



(29) 



within a distortion vector d = {do, di, 
Thus for any i > and < 
the reconstruction sequence of sI'Lj 
have that E [\\s^_^ - sf_^||^] < ndj 



■■■,dKV. 

j < K, if sf_^- is 
at time i, we must 
We will assume that 
do < di < ■ ■ ■ < dx holds. Furthermore as will be discussed 
in the coding theorem, it suffices to restrict K = B + W. 

As before, the channel can introduce an erasure-burst of 
length B in an arbitrary interval + B — 1]. The decoder 
is not required to output a reproduction of the sequences tf 
for i £ [j, j -\- B -\- W — 1]. The lossy rate-recovery function 
denoted by R{B, W, d) is the minimum rate required to satisfy 
these constraints. 

Theorem 3. For the Gaussian source model with a distortion 
vector d = (do, . . . , dx) with < dj < 1, the lossy recovery- 
rate function is given bj^ 



R{B, W, d) 



1 



i{K-W,B} 



W + 1 



E 

k=l 



z \aw+k 



(30) 



The coding scheme for the proposed model involves using 
a successively refinable code for each sequence s" to produce 
K +1 layers and mapping the sequence of layered codewords 
to a diagonally correlated deterministic source. The proof of 
Theorem [3] is provided in Sec. IVIII 

'All logarithms ai'e taken to base 2. 
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Fig. 3. Comparison of rate-recovery of sub-optimal systems to minimum 
possible rate-recovery function for different recovery window length W. 



In Fig [3] the rate-recovery functions of various schemes 
are compared. The sub-optimal schemes considered are Still- 
Image compression (SI), Wyner-Ziv Compression with de- 
layed side-information (WZ) and Predictive Coding plus 
FEC (FEC) which are studied in detail in Sec. IVII-DI 
We assume K = 5, B = 2 and the distortion vector 
d = {0.1,0.25,0.4,0.55,0.7, .85}^. It can be observed from 
Fig [3] that except when W ~ none of the other schemes 
are optimal. The Predictive Coding plus FEC scheme, which 
is a natural separation based scheme is sub-optimal even for 
relatively large values of W. 

D. Symmetric Sources 

A symmetric source is defined as a Markov source such that 
the underlying Markov chain is also reversible i.e., the random 
variables satisfy {sq, . . . ,St) = (st, . . . , Sq), where the equality 
is in the sense of distribution llT4ll . Of particular interest to us 
is the following property satisfied for each t > 1. 



Pst 



(Sq, Sfc) = Pst-i,StiSa, Sb), VSq, Sb e S 



(31) 



I.e., 

(sr. 



we can "exchange" the source pair {s^_^i,s") with 
1^ , s" ) without affecting the joint distribution. An important 
class of sources that are symmetric are the binary sources: 
s" = s^_i 2", where {z"}t>i is an i.i.d. binary source 
process (in both temporal and spatial dimensions) with the 
marginal distribution Pr{zt,i = 0) = p, the marginal distribu- 
tion Pv{st,i = 0) = Pr(st^i = 1) = 5 ® denotes modulo-2 
addition. 



Theorem 4. For the class of symmetric sources that sat- 
isfy (131b the rate-recovery function, restricted to the class of 
memoryless encoders, is given by 

, s_B+w+i|so)- (32) 



-H{sb+i, Sb 



+2, 



Note that the achievability follows immediately from (Q. 
Thus it only remains to show that the lower bound (|6|l can to be 
improved. We have only been able to obtain this improvement 
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for the class of memoryless encoders. For the general encoder 
structure Q this remains an open problem. At first glance one 
may expect that when memoryless encoders are considered, 
the binning based scheme is always optimal. Interestingly 
this is not true. The prospicient encoders for the diagonally 
correlated source models in section HIl-Bl are memoryless and 
yet improve upon the binning based lower bound. Our proof 
only applies to the class of symmetric sources. 

Our proof presented in Sec. IVIIII involves an interesting 
connection to a multi-terminal source coding problem called 
zig-zag source coding fl2], [15], (TU\. In particular we develop 
a simple approach to lower bound the sum-rate of a zig-zag 
source coding network with symmetric sources that may be of 
independent interest. 

E. Delay-Constrained Decoder 

As a variation of the problem setup in Section where 
instantaneous recovery of each source sequence is desired, 
we consider a delay-constrained decoder in this section. A 
receiver with a delay constraint of T recovers 

Si = Si{9o,9i, ■ • ■,9i+T), 

i^{j,...,j + B + W -1} (33) 

if an erasure-burst of length B occurs in the interval [j,j+B — 
1]. The rest of the setup does not change. The rate-recovery 
function is a function of three parameters: W, B and T i.e., 
R{B,W,T). Note that T = reduces to the case treated in 
the rest of the paper. 

Theorem 5. The rate- recovery function, when W = Q is given 
by RoiB, T) = R{B, W = 0,T) where 

MB,T) = -l—HisB+i\so) + ^F(silso) (34) 

,...sb+t+i\sq). (35) 



T+1 
1 



-H{sB+i,si 



T+1 

The minimum rate is achieved by applying a Slepian-Wolf 
code to each source sequence and jointly decoding the source 
sequences {s^+b^ ^j+B+i^ ■ ■ ■ ' ^j+b+t) time j + B + T, 
following a burst in-between {j,j + l,...,j + B — 1). 

The complete proof of Theorem. |5] is provided in Sec. HX] 

IV. General Upper and Lower Bounds on 
Rate-Recovery Function 

We first establish the achievability of R'^{B,W) and then 
the lower bound R^{B, W) in Theorem [1] 

A. Achievability 

From Corollary. [T] it suffices to show that 
1 



R+ 



W + 1 



SB+w+i\so) + e (36) 



is achievable for any arbitrary e > 0. 

We use a Slepian-Wolf codebook which is generated 
by randomly partitioning the set of all typical sequences ifTTl 
T"(s) into 2"^ bins. For each i > the partitioning is done 



independently and all the partitions are revealed to the decoder 
ahead of time. 

Upon observing s" the encoder declares an error if s" ^ 
r"(s). Otherwise it finds the bin to which s" belongs to 
and sends the corresponding bin index fi. We consider two 
cases for recovering at time t — i. First, suppose that the 
sequence s[Li has already been recovered. Then the desti- 
nation attempts to recover s" from {fi,s2_^). This succeeds 
with high probability if i?+ > iJ(si|so), which is guaranteed 
via (|36] |. Next suppose that s^_^ has not been recovered 
by the destination but sf needs to be recovered. This only 
happens when s" is the first sequence to be recovered after 
the erasure burst. In particular the erasure burst must happen 
between [i - B' - W,i - W - I] for some B' < B. The 
decoder thus has access to s^_b'-w-i^ before the start of 
the erasure burst. Upon receiving fi-.\Y,---,fi the destina- 
tion simultaneously attempts to recover {s"_y^,, . . . , s") given 
{Si-B-w-i' fi-w, fi)- This succeeds with high probabil- 
ity if. 



j=i-W 

> H{Si-W, ■ ■ ■ , Si\Si^W-B-l) 



(37) 



which is also guaranteed by 



B. Lower Bound 



Our proof is an extension of the intuition developed in 
section IIII-AI 

For any sequence of (n, 2"^) codes we show that there is 
a sequence e„ that vanishes as n — > such that 

R > H{si\so) + -^^I{sp;sB\so) - (VF + l)e„ (38) 

where throughout we let p = B + W + 1. 

We consider a periodic erasure channel of period p where 
the first B packets are erased i.e., for each fc > 0, suppose that 
an erasure happens at time interval t = {kp,kp+l, . . . ,kp + 
B — 1}. Consider: 



{W + l)n{t+l)R 

— ^\'b ' 'p+B ' ■ ■ ■ ' '{t-l)p+B' 'tp+B I 
t 

f(fc+l)p-l 



(39) 



— "^VB ) + Z^-"\'kp+B \'B ' >+B ' ■ • • ' '^(fc-l)p+S^ 



k=l 
t 



{k+l)p-l\fkp-l 
kp+B 



(40) 



k=l 



where the last step follows from the fact that conditioning 
reduces entropy. 

We bound the term Vo''^^"^) for each A: > 1. 

By definition, the source sequence s"^,^-^^^ must be recov- 
ered from (/o^"\/fep+B,/fep+s+i,.--,/(fe+i)p-i)- Applying 
Fano's inequality we have that 

H{S(k+i)p-M^^^ ^ fkp+B, . . . , f(k+i)p-i) < nen (41) 
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and 



H(f, 



(fe+l)p-l I rkp-l^ 



ffcp-l\ 



(42) 



'fcp+B I '^0 ) - ^V^(fc+l)p-l I '^0 

^y'kp+B I ^(fc+l)p-l' 'O ) ^^r. 

where ( |42] | follows from applying Fano's inequality of ( 1411 1. 
Now we bound each of the two terms in (l42b . First we note 
that: 

H{sli+,),_X'^') > ^(srfc+i)p-i|/o''~\srp-i) (43) 
= HisJ^k+i)p-iK-i) = nHisp\so), (44) 
where the last step follows from the Markov relation fQ^~^ 

i-fep-l ^ ^(fc+l)p-r 

Furthermore the second term in (|42| | can be lower bounded 
using the following series of inequalities. 



H 



[Jkp+B 



^(fc+l)p-l' '0 



,(fe+l)p-2 



'kp+B 



^(fe+l)p-l' 'o 



fcp-1 

fcp+B-l 



(45) 
(46) 



>H 



(Ak+l)p-2 
ykp+B ' 

~ ^ (^fep+B ' • • ■ 

- H- (Ak+l)p-2 
\^'/cp+B ' 



^'kp+BT ■ ■ '■5(fe+l)p-2|-^(fc+l)p-l/o 
(fe+l)p-2 



kp+B-1 



(fc+i)p-2P(fc+i)p-i' '0 J y^'> 



-n n in rkp+B-1 

'kp+B'- ■ ■ '•=(fe+l)p-2P(fc+l)p-l' 'O 



WnSn 



(48) 



\^kp+BT ■ •7S(fc+l)p-2P(fc+l)p-l''0 



fcp+B-1 



WnSr, 



>frf<:^ q" lq" fkp+B-1 

\^^fcp+BJ- ■ •7-=(A:+l)p_2P(fc+l)p-l: 'O 

— WnSn 

\^kp+B>^kp+B + l7 

- WnSn 



. . ,s, 



TL I TL Tl \ 

{k+l)p-2P{k+l)p-l^^kp+B-lj 



(49) 



=nH{sB+i,SB+2,- ■ ■ ,Sp-i^B,Sp)-Wnen 
= nH{sB+i,SB+2, ■ ■ ■ , Sp^i, Sp\sB)~nH{sp\sB) -WnSn 
= n{W + l)ff(si|so) - ni/(sp|ss) - WnSn, (50) 
where (|48] l follows from the fact that 
{Skp+B+i^ • • • . ^(k+i)p-2} must be decoded from f^'^+^'^P"'^ 
and hence Fano's inequality again applies and ( |49] l follows 
from the fact that 



fkp+B- 



fcp+B-1 



kp+Bi ■ ■ ■ 1 ^(k+l)p-2 



Combining ( |42] |. ( l44l i and dSOl l we have that 



,(fc+l)p-l I rkp-l 



)■ (51) 



'fep+S I '0 

> nH{sp\s^) + + l)iJ(si|so) - niJ(sp|sB) 

- (M^ + l)n£„ (52) 
Finally substituting ( |52] i into ( |40] i we have that, 

+ l)n(t + 

> - (W^ + + ntH{Sp\sQ) 

+ nt{W + l)H{si\so)-ntH{sp\sB) (53) 

^HifP-') - iW + l)nen 

+ nt {{W + l)i/(si|so) + /(sp; ss|so)) (54) 

As we take n ^ oo and then t ^ cxd we recover 



V. Diagonally Correlated Deterministic Sources 
We establish Prop. [T| in this section. 

A. Source Model 

We consider the semi-deterministic source model with a 
special diagonal correlation structure as described in Def. [T] 
The diagonal correlation structure appears to be the most 
natural structure to consider in developing insights into our 
proposed coding scheme. As we will see later in Theorem |2] 
the underlying coding scheme can also be generalized to a 
broader class of linear semi-deterministic sources. Furthermore 
this class of semi-deterministic sources also provides a solu- 
tion to the Gaussian source model as discussed in Theorem |3] 

We first provide an alternate characterization of the sources 
defined in Def. [T] Let us define, 



k,l — R'fe.fe-l«.fe-l.fc-2 • 



Rfc 



.R 



■;+2j+iR-;+ij 



(55) 



where k > I. Note that since each Rjj-i is assumed to have 
a full row-rank (c.f. Def. ^ the matrix R^ ^ is a iV^ x Ni 
full-rank matrix of rank Nk- From Def. [T] 



R-l,OSi-l,0 
R-2,0Si-2,0 



(56) 



where {si^K,o, ^i-K+i.o, Si,o} innovation sub-symbols 
of each source. This is expressed in Fig. |4] Any diagonal 
in Fig. |4] consists of the same set of innovation bits. In 
particular the innovation bits are introduced on the upper-left 
most entry of the diagonal. As we traverse down, each sub- 
symbol consists of some fixed linear combinations of these 
innovation bits. Furthermore the sub-symbol S; ., is completely 
determined given the sub-symbol Si_i.j_i. 

In this section, we first argue that analyzing the coding 
scheme for the case K = B + W is sufficient. Then we 
explain the prospicient coding scheme which achieves the rate 
specified in (|25l l. Finally, the proof of the rate-optimality of 
the prospicient coding scheme is provided by establishing the 
equality of the rate expression (l25T l and the general lower 
bound in (|24] |. 

B. Sufficiency of K = B + W 

We first argue that for our coding scheme, it suffices to 
assume that each source symbol consists of one innova- 
tion sub-symbol and a total of K ^ B + W deterministic 
symbols. In particular when K < B + W, by simply adding 
K — B — W zeros, the source can be turned into a source 
with B + W deterministic sub-symbols. 

For the case K > B + W we argue that it suffices to 
construct a coding scheme with K = B + W. The remainder 
of the sub-symbols can be trivially computed by the receiver. 
In particular, at any time i, either Si„i or Si^s-w-i is 
guaranteed to be available to the destination. In the former 



8 




Fig. 4. Schematic of Diagonally Correlated Deterministic Markov Source. The first row of sub-symbols are innovation symbols. They are generated 
independently of all past symbols. On each diagonal the sub-symbol is a deterministic function of the sub-symbols above it. 



case, except the innovation bits of Sj, all other bits are 
known. Thus all the deterministic sub-symbols, including those 
corresponding to K > B + W can be computed. In the latter 
case, because of the diagonal structure of the source, the sub- 
symbols Si J, for j > B + W + 1, are deterministic functions 
of Si^B-w-i (c f. (l56b), and therefore, are known and can 
be ignored. Thus without loss of generality we assume that 
K = B + W is sufficient. 



C. Prospicient Coding 

Our coding scheme is based on the following observation, 
illustrated in Fig. |5] Suppose that an erasure happens between 
t e [i - W - B - 1,1 - W - I] and after the "don't care" 
period of [i — W,i — 1] we need to recover s". Based on 
the structure of the source, illustrated in Fig. |5] we make the 
following observations: 

• Sub-symbols s^.i, . . . , S; can be directly computed 
from the innovation sub-symbols s^^i o, ■ • ■ , Si~w,o, re- 
spectively. 

• Sub-symbols Si,w+i, ■ ■ ■ , ^i^w+B can be computed from 
sub-symbols Si_vK,i, • • ■ , Si^w,B, respectively. 

Thus if we send the first B + 1 sub-symbols at each time 
i.e., Xi = (si.O; • • • , Si^s) then we are guaranteed that the 
destination will be able to decode s" when an erasure happens 
between [i — B~W,i — W—l].To achieve the optimal rate, we 
further compress as discussed below. Our coding scheme 
consists of two steps. 

7 ) Source Rearrangement: The source symbols consist- 
ing of innovation and deterministic sub-symbols as in Def. [T] 



are first rearranged to produce an auxiliary set of codewords 



C,; = 



Si+W,W+l 
Si+W.W+2 



(57) 



\ci,B J \^i+W,W+B / \Rw+B,SSi,s/ 

where the last relation follows from (l56l ). 

Note that the codeword consists of the innovation symbol 
Si^o. as well as symbols Sij^w,w+ii ■ ■ ■ , Si+w,w+B that enable 
the recovery of symbols in Si+w- The codeword Ci^w consists 
of the green circles in Fig. |5] 

It can be verified from ( fSTl i that the rate associated with the 
codewords is given by 



Ro=Nn 



W+B 

E 

k=W+l 



(58) 



which is larger than the rate-expression in JZSl ). In particular 
it is missing the 



w+i 



factor in the second term. This factor 
can be recovered by binning the sequences c" as described 
next. 

2) Slepian-Wolf Coding: There is a strong temporal corre- 
lation between the sequences c" in jSTj ). As shown in Fig. |6] 
as we proceed along any diagonal the sub-symbols Cij and 
Cj+i.j+i contain the same underlying set of innovation bits 
i.e., from sub-symbol Si_j.o- 

To exploit the correlation, we independently bin the code- 
word sequences c" into 2"^ bins at each time. We let 
R = R{B,W) + £ is as given in dZST l, and only transmits 
the bin index of the associated codeword i.e., fi = J^(cf ) G 
{1,2,...,2"«}. 

It remains to show that given the bin index fi, the decoder 
is able to recover the underlying codeword symbols c". 
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Fig. 5. Schematic of Coding Sclieme- Codeword structure. We set p = B + W + 1. 




o o 

Not to be recovered 




Fig. 6. Scliematic of Coding Scheme- Rate reduction. 



B-1 



Analysis of Slepian-Wolf Coding: Recall that we only 
transmit the bin index fi of c". The receiver first recovers 
the underlying sequence c" as follows: 

1) If the receiver has access to s"_]^ in addition to ^ it can 
recover c" if 



R> H{c,\s,^i) = H{c,^o) ^ No. 



(59) 



where the second equality follows since c^.i, Ci^w 
are all deterministic functions of s,;.i, Si,w, which in 
turn are deterministic functions of Si_i. Clearly ( |59] l is 
satisfied by our choice of R in (|25l l. 
2) The decoder has access to Si^s-w-i and 
{fi^w 1 fi-w+i, ft}- The decoder is able to recover 



{ci-w, •■•,Cj} if 

{W + 1)R > i?(Cj, Ci_i, Ci^w\Si-B-W~l) 
W+1 B 

= J2 H{c,^k,o) + H{c,^w,k) (60) 



fc=0 



fc=l 



{W + l)No + Y,Nw+k, (61) 



/c=l 



where ( |60l l comes from the diagonal correlation property 
illustrated in Fig. |6] Our choice of R (IZST i guarantees 
that (l6n i is satisfied. 



D. Rate-Optimality of the Coding Scheme 

We specialize the general lower bound established in The- 
orem [T] to the case of diagonally correlated deterministic 
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sources. Using d56T l and p = B + W + 1 we have 
R > i/(si|so) + ^ I{sp; ss|so) 

= jj'(si|si_i)+ ^^ ^ {i7(si|si_p) - i7(si|si_i4'-i)} 
= if(s^|sj_i) + 



1 



W + 1 



H{si,o, ■ • ■ , 1^1^,084-1^,0) (62) 



According to the fact that innovation bits of each source are 
drawn i.i.d. (|62] | reduces to 

R > H{s,^o) + (^H{s,^o) + J2 (Rfc,os.-fc,o) j 

^ (^i/(s.,o) + i?(Rfc,os.-fe,o)^ (63) 

/p-l w \ 



W- 

No 



1 

No + V Nk 

k=W+l 



(64) 
(65) 



where (|64] i follows from the fact that Rfc.o are Nk x Nq full- 
rank matrices of rank Nk- Since (l65T l equals dZST l for iiT = 
B + W the optimality of the proposed scheme is established. 



VI. Linear Semi-Deterministic Sources 

We consider the class of linear deterministic sources as 
defined in Def. |2]in this section. Recall that for such a source 
the deterministic component Si,d. is obtained from the previous 
sub-symbol Si_i through a linear transformation i.e., 



[A B] 



s-t-1,0 



As discussed below, the transfer matrix [A B] can be con- 
verted into a block-diagonal form through suitable invertible 
linear transformations, thus resulting in a diagonally correlated 
deterministic source. The prospicient coding scheme discussed 
earlier can then be applied to such a transformed source. 



A. Case 1 

Our transformation is most natural for the case when A is 
a full row-rank matrix. So we treat this case first. Let 

iVi = Rank(A) < miii{iVo, Na}. (66) 

In this section we restrict to the special case where A^i ~ Nd, 
i.e. A is a full-row-rank matrix with Nd independent non- 
zero rows. For this case, we explain the coding scheme by 
describing the encoder and decoder shown in Fig [T] 





C 




Prospicient 








Encoder 




Encoder 








Decoder 














Prospicient 


L 




Decoder 



Burst 
Erasure 
Channel 



Fig. 7. Block diagram of the system described in Case 1. 



1) Encoder: As in Fig.|7] the encoder applies a memoryless 
transformation block C{.) onto each symbol to yield Si = 
L{si), a diagonally correlated deterministic source. We discuss 
the £(•) mapping below. 

Suppose that X is a matrix of dimensions iVo x Nd- Define 



and observe that 



M 



I X 

I 



X 

I 



(67) 



(68) 



Si-1,0 



(69) 
(70) 



(72) 



For a certain X to be specified later, let 

s,,d = [A B] M-iM 

= [A B - AX] 

Since A is a full-rank matrix, we may select X such that 

B - AX = (71) 
With this choice of X, ( iTOl i reduces to 

= [A 0] 

Now, define the linear transformation C{.) as follows. 

Note that 1) The transformation £(.) is memoryless and 
requires no knowledge of the past source sequences, 2) The 
innovation bits are independently drawn and independent 
of Si^d- Hence are drawn i.i.d. according to Bernoulli- 
(1/2), and are independent of Si d, 3) The map between the 
two sources s, and are one-to-one. 

Observe that Si is diagonally correlated Markov source with 
A^o innovation bits i^^o and Nd deterministic bits i^^i that 
satisfy 



(74) 



We transmit the source sequence {si} using the prospicient 
coding scheme. 
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Fig. 8. Block diagram for general linear semi-deterministic Markov sources. 



2) Decoder: At the receiver, first the Prospicient decoder 
recovers the diagonally correlated source at any time except 
error propagation window. Then whenever is available, the 
decoder directly constructs as 

— 1^;; 



Si = £ (Si) 



(75) 



3) Rate-optimality: Suppose that our two step approach 
in Fig. |7] is sub-optimal. Then, in order to transmit the 
through the channel, one can first transform it into via 
and achieve lower rate than the prospicient coding scheme. 
However this is impossible because prospicient scheme is 
optimal. This shows the optimality of the coding scheme. 

B. Case 2 

Now we consider the general case of semi-deterministic 
Markov sources defined in Def. |2] As illustrated in Fig. [8] 
the reduction to the diagonally correlated source is done in 
two steps using two linear transforms: £/(•) and Cb{-)- 

Lemma 1. Any semi-deterministic Markov source specified 
in Def. |2] or equivalently by (I27l l, can be transformed into 
an equivalent source consisting of innovation component 
Si,o G {0, 1}^" and K deterministic components that sat- 
isfy (I76l l, at the top of the next page, using a one-to-one linear 
transformation Cf where 



1) G {0, 1}^^ for j e {0, . . . , i^} where 



No > Ni > ... > N, 



(78) 



and J2k=i = Nd. 

2) Rj j_i is Nj X Nj^i full-rank matrix of rank Nj for 

J e{i,...,K -1}. 

3) The matrix Rk,k~i is either full-rank of rank Nk or 
zero matrix. 

The transformation to involves repeated application of 
the technique in case 1. The proof is provided in Appendix IbI 
The proof provides an explicit construction of £/ . 

Lemma 2. Consider the source — Cf{si) where is a 
semi-deterministic Markov source and Si is defined in ( I76l l. 
There exists a one-to-one linear transformation Ch which maps 
Si to a diagonally correlated deterministic Markov source Si 
that satisfies (fTTT i. 

To illustrate the idea, here we study a simple example. The 
complete proof is available in Appendix |C] Assume K ^ 2 



and consider the source consisting of A^o innovation bits 
Sifl and A^i + N2 deterministic bits as 



Si,l 
Si,2 



R-1,0 R-14 R-1,2 

R2.I R2,2> 



'Si-1,0^ 

s,-i,i I (79) 



where Ri.o and R2.1 are full-rank (non-zero) matrices of rank 
A^i and N2, respectively. 

The following steps transforms the source into diagonally 
correlated Markov source. 
Step 1: Define 



Si, 2/ ~ V Ino) Ui.2 



and 



No 













Xi 











and note that 

/I 
Dj-i = I -Xi 
\0 I 

By these definitions it is not hard to check that 



Si,l 
Si, 2 

I Xi\ /Ri,o Ri,i Ri,:, 



(80) 



(81) 



(82) 



where 



R-1,0 R-1,1 R-1,2 

R2,l R2,2 ~ R2,lXi 



Ri.i — R-i.i + XiR2,i 







1.0 




h- 


1.1 






1,2 










;:) 




\Si- 


1,2/ 





(83) 



(84) 



Ri,2 — Ri,2 + XiR2,2 — XiR2,iXi — Ri^iXi (85) 

R2,i is full-row-rank of rank N2 and R2,2 is N2 x A'2 matrix, 
thus Xi can be selected such that 



R2.2 ^ R2,iXi — 



and (I83ll reduces to 



Si,l 
Si. 2 



Step 2: Define 



1^1,0 Ri,i R-1.2 
R2,i 




and 



/Si-1,0^ 

i-1,0 = (I Xi^2 X2.2) I Si_i4 

\Si-1.2> 



I X1.2 X2,: 

D2 = I I 
,0 I 



(86) 



(87) 



(88) 



(89) 
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Si.d 



( \ 

SL2 



Si,K-l 



/R-i.o Ri,i 
R2,i 




V 



/ Si,l \ 
Sl2 



R.i,i<:-2 Ri,i^_i Ri,A' \ 

R2.,K-2 R2,A'-1 R2,7\' 

Rk-1,7\'-2 Rk-1,7\'-1 Ra'-1,7\' 

IiK,K-l ^K.K I 



( Si-1,0 \ 
Sj-l,l 

Si-l,K-2 
St-1,K-1 



/Ri.o 

R2,i 




V 






RA'-l,if- 





\ 



S.t-1,1 



Si-l,i<--2 



Ra,A-1/ \^i-\,K-\) 



(76) 



(77) 



and note that 




It can be observed that 



Si,l 
Si,2 



Ri,o Ri,i Ri,2 
R2.1 



(90) 



Si-1,0 

D2-1 I s,_i,i I (91) 

Si-1,2/ 



Ri,o Ri,i — Ri, 0X1^2 R1.2 ~ Ri,oX2,2 
R2,i 

(Si-l,o\ 
ii-1,1 (92) 
§,-1,2/ 

Similarly, X1.2 and X2,2 are selected such that 

Ri,i -Ri,oXi,2 =0 (93) 

Ri,2 - Ri,oX2,2 = (94) 

Therefore, the source consists of A^'o innovation bits and 
N\ + A''2 deterministic bits as 



Si,l 
Si,2 



Ri,o 








R2,l 


Ri,o 








R2,l 




(95) 
(96) 



Clearly, = C.h(^i) is a diagonally correlated deterministic 
Markov source and the mapping is invertible. 

Exploiting Lemmas [T] and |2] any linear semi-deterministic 
source is first transformed into a diagonally correlated 
deterministic Markov source = £b(£/(si)) and then 
is transmitted through the channel using prospicient coding 
scheme. The block diagram of encoder and decoder is shown 
in Fig |8] The optimality of the scheme can be shown using a 
similar argument in Sec. I VI- A3 1 

VII. Gaussian Sources: Proof of Theorem[3] 

In this section we investigate the Gaussian source model 
with window recovery constraints explained in IIII-CI First we 
argue that it is sufficient to consider the case K = B + W and 



then the coding scheme and rate-optimality of the scheme is 
studied. Finally the rate-recovery function of different schemes 
are provided for comparison. 

A. Sufficiency of K = B + W 

First, we argue that it is sufficient to consider the case 
K = B + W. In particular, xf K < B + W, we can 
assume that the decoder, instead of recovering the source 
ti — {si, s,;_i, . . . , Si^xy at time i within distortion d, aims 
to recover the source t[ = {si, Si^K'^ within distortion d' 
where iC' = B + and 



4 = 



for J e {l,2,...,i^} 
for J e {K + l,...,K'} 



(97) 



In the case K > B + W, at each time i, s,;_j is required to be 
recovered within distortion dj for j 6 {i?-|-W^ + l,...,A'}, 
however there is always a better reconstruction available from 
the past. In particular, according to the problem description 
the decoder at each time i has recovered ti_i or ii^B-w-i- 
In the former case, {si-j }d _i is available from time i — 1 and 
dj-i < dj and in the latter case, {si-j}d^w-B-i is available 
from time i — B — W — 1 and dj^w~B~i < dj. Thus one 
can simply solve the problem for the case K ^ B + W. 

B. Coding Scheme 

In this section, we propose a coding scheme based on suc- 
cessive refinement of the Gaussian source. The block diagram 
of the scheme is shown in Fig. |9] 

1) Successive Refinement (SR) Encoder: The structure of 
SR encoder is shown in Fig.[TO] The encoder at time i, encodes 
source signal s" using a {B + l)-layer successive refinement 
coding scheme 1 18 1, 1 19] to generate (-B+1) codewords whose 
indices are given by {mi^, rrii i, . . . , rrii^B-i, '^^^,B} where 
m^j e {1, 2, . . . , 2"-"^ } for j G {0, 1, . . . , B} and 

ilog(^) forj^O 
^.-<|llog(%^) for J e {1,2,..., B} (98) 
for j = B, 



The j-th layer uses indices 



(99) 
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Fig. 10. {B + 1)-Iayer coding scheme based on Successive Refinement. 



for reproduction and the associated rate is given by: 

Rj = 

lEt, = \ log(^) for J e {1, 2, . . . , i? - 1}, 

(100) 

and the corresponding distortion associated with layer j 
equals do if j = and for j = 1, . . . , B. 

Clearly, for recovering t" = (s", . . . , s,[!_^_jy) with a 
distortion tuple (do, • • ■ , ds+w), it suffices that the destination 
have access to: 

M,,o 



M, = 



i-W-B,B^ 

As described next, our coding scheme uses the prospicient 
code construction in Section |V] for the diagonally correlated 
source model to guarantee that the receiver obtains Mi for 
each i outside the error propagation window. 

2) Layer Rearrangement Block: For simplicity first assume 
that all Rj s are integer. The results can be generalized for non- 
integer rates as explained in Appendix |D] Each index j 
is isomorphic to a length n sequence b"^ over the alphabet 



Bj G {0, 1}^^ and the indices associated with layer j are 
isomorphic to a sequence c" defined as 



(102) 



and the collection of layers Mi as defined in ( llOll i is isomor- 
phic to 

''i-l.O 



n 



-i-W-B.B. 



(103) 



As shown in Fig. |9] the sequence d" is encoded at time i 
and recovered outside the error propagation window. 

3) Prospicient Encoder/Decoder: It can be readily verified 
that d" in (1103b is a linear diagonally correlated semi- 
deterministic source as defined in Def. [T] Hence applying the 
prospecient coding scheme in section |V] the achievable rate 
from Prop. [T] is 



R = Rq + 



k=l 



W+j 



(104) 
(105) 



4) Decoding of ii : Using d" , which is isomorphic to 
Mi defined in (llOll l. the decoder is guaranteed to re- 
. . . , sJ^vf) ^i'^h distortions do and the sequences 
■ ,s^_^r_B with distortions dw+i, ■ ■ ■ : dw+B re- 



cover (sf, 

=i-W-n ■ ■ 
spectively. 



C. Converse for Theorem \3\ 

We need to show that for any sequence of codes that achieve 
a distortion tuple (do, • ■ • ,dw+B) the rate is lower bounded 
by (fT05l l. 

As in the proof of Theorem[Tl we consider a periodic erasure 
channel of period p = B + W + 1 and assume that the first 
B positions of each period are erased. Consider, 



(101) (W + l)n{t+l)R 

iii^r\^irB\ 



ftp 



' '(t-l)p+B' 'tp+B 



(t+l)p-l 



TT ( r(fc+l)p-l|ffcp-lA 

ykp+B vo J 



(106) 



rfcp-l 

• ' '{k-l)p+B) 



t 

E 



(107) 



where the last step follows from the fact that conditioning 
reduces entropy. 

We next establish the following claim, whose proof is in 
Appendix |E] 
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Claim 1. For each k > I we have that 



H f, 



'kp+B 







1=1 



(108) 



Substituting ( llOSI l into ( 11071 ) and taking n — > oo and then 
t cxD, we recover 



.7 = 1 



IW+j 



as required. 



D. Illustrative Sub-optimal Schemes 

As explained in Sec llll-Cl and Fig. [3] the optimal perfor- 
mance is compared with the following sub-optimal schemes. 

1) Still-Image Compression: In this scheme, the encoder 
ignores the decoder's memory and at time i > encodes the 
source ti in a memoryless manner and sends the codewords 
through the channel. The rate associated to this scheme is 



i?si(d) - I{t, 



K 1 

fc=0 



(110) 



In this scheme, the decoder is able to recover the source 
whenever its codeword is available, i.e. at all the times except 
when the erasure happens. 

2) Wyner-Ziv Compression with Delayed Side Information: 
At time i the encoders assumes that U-b-i is already 
reconstructed at the receiver within distortion d. With this 
assumption, it compresses the source ti according to Wyner- 
Ziv scheme and transmits the codewords through the channel. 
The rate of this scheme is 



^1 



i?wz(5,d) ^/(t,;t,|t,_B_i) = ^|log( ^ 



(111) 



Note that, if at time i, ii-s-i is not available, ti_i is available 
and the decoder can still use it as side-information to construct 
tj since /(t^; ti|tj_B-i) > /(t^; tj|ti_i). 

As in the case of Still-Image Compression, the Wyner-Ziv 
scheme also enables the recovery of each source sequence 
except those with erased codewords. 

3) Predictive Coding plus FEC: This scheme consists of 
predictive coding (DPC) [l ] followed by a Forward Error Cor- 
rection (FEC) code to compensate the effect of packet losses 
of the channel. As the contribution of B erased codewords 
need to be recovered using W + 1 available codewords, the 
rate of this scheme can be computed as follows. 

B + W 



W + 1 
B + W+1 ( l_ 



(113) 



VIII. Symmetric Sources: Proof of TheoremH] 

The special case when W — Q follows directly from We 
only need to consider the case when W >\. For simplicity in 
exposition we consider the case when W = 1. Then we need 
to show that 



R{B,W=1) > ^H{SB + 1,SB+2\S0) 



(114) 



The proof for general 14^ > 1 follows along similar lines and 
will be sketched briefly. 

Assume that an erasure-burst spans time indices 
j — B, . . . , j ~ 1. The decoder must recover 

sj^,^g,+^[fr''-\f,Jj+i)- (115) 

From Fano's inequality, we have, 

H (sjVi I rr""'', fj, fj+i) < ne^. (116) 
Furthermore if there is no erasure until time j then 



must hold. Hence from Fano's Inequalty, 



(117) 



(118) 



Our aim is to combine (11 16l l and dl 18l l to establish the 
following lower bound on the sum-rate 

R, + Rj+i > H{sj+i\sj) + H{sj\sj^B-i)- (119) 
The lower bound then follows since 

R>ma.x{Rj,Rj+i) (120) 

>kRj+Rj+i) (121) 



2 

> l(H{sj+i\sj) + H{s,\sj^B-i)) 



(122) 



= 2^H{s^+i\s^,s^^B-i) + H{s^\s^^B-i)) (123) 
= ^H{Sj+i,s^\sj_B-i) = ^H{sb+i,sb+2\sq) (124) 



thus establishing dl 14b . 

To establish dl 191 ) we make a connection to a multi-terminal 
source coding problem in Fig. [TT| 

A. Zig-Zag Source Coding 

Consider the source coding problem with side infor- 
mation illustrated in Fig. [TTT a). In this setup there are 
four source sequences drawn i.i.d. from a joint distribution 
p{sj+i, Sj, Sj-i, Sj-B-i)- The two encoders j and j + 1 are 
revealed source sequences s" and Sj^^i and the two decoders j 
and j + 1 are revealed sources s"_j^ and s"_^_j^. The encoders 
operate independently and compress the source sequences to 
fj and fj-^-l at rates Rj and Rj+i respectively. Decoder j 
has access to {fj,s^_^) while decoder j + 1 has access to 
{fj, fj+i, Sj_g_i) and are interested in reproducing. 



•l-B-H 



(125) 
(126) 
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Fig. 1 1 . Connection between the streaming problem and Zig-Zag source coding problem. The setup on the right is identical to the setup on the left, except 
with the side information sequence replaced with sj'^j. However the rate region for both problems turns out to be identical for syiTunetric Markov 

sources. 



respectively such that Pr(s" s") < e„ for i — j.j + 1. 

When s"_3_]^ is a constant sequence, the problem has been 
studied in lfT2l . lfT6l . A complete single letter characterization 
involving an auxiliary random variable is obtained. Fortunately 
in the present case of symmetric sources a simple lower bound 
can be obtained using the following observation. 

Lemma 3. The set of all achievable rate-pairs {Rj, Rj+i) 
for the problem in Fig. llll a) is identical to the set of all 
achievable rate-pairs for the problem in Fig. IIIW ) where the 
side information sequence s"_j^ at decoder 1 is replaced by 
the side information sequence s"_^j^. 

The proof of Lemma [3] follows by observing that the 
capacity region for the problem in Fig. fTTT a) depends on 
the joint distribution p(sj, Sj+i, Sj_i. Sj_B_i) only via the 
marginal distributions p{sj,Sj^i) and p{sj^i,Sj,Sj^B-i)- 
When the source is symmetric the distributions p(sj, Sj^i) and 
p(sj,Sj+i) are identical. The formal proof will be omitted. 

Thus it suffices to lower bound the achievable sum rate for 
the problem in Fig. [TTl b). First upon applying the Slepian- 
Wolf lower bound to encoder j + 1 

7iR,+i > ff(s;+i|s7_s_i, f,) - ne„ (127) 

and to bound Rj 

nR,>H{f,)>I{f,;s^\s^_s_,) 
>H{s-\s^_s_,)-H{s-\s-_s^,,f,) 
>nH{sB+i\so)-His^\s^_s_„f,) 

+ H{s^\s!l_s_,,s';^,J,)-nen (128) 
= nH{sj\sj-B-i) - I{Sj] sJVi|s7_B_i, fj) ~ nsn 
= nH{Sj\sj^B-i) - H{s^+^\sJ_B^i, fj) 

+ His;'+,\s;'_B^i,s^,fj)~nen 
= nH{Sj\sj^B-i) - i7(s;Yi|s;_B_i, fj) 

+ ni7(sj+i|sj) - nsn (129) 

where (I128l l follows by applying Fano's inequality 
since s" can be recovered from (sJYii fj) and hence 
-ff(s"|sJLB_i,s"_^]^, ^) < nSn holds and (I129l l follows form 



the Markov relation s^Yi ~^ s" — > (^"i S^-b -i)- Observe 
that ( |119t follows by summing ( 1127b and (|129b . 

B. Connection between Streaming and Zig-Zag Coding Prob- 
lems 

It remains to show that the lower bound on the Zig-Zag 
coding problem also constitutes a lower bound on the original 
problem. 

Lemma 4. Suppose that the encoding function fj = J-j{s") 
is memoryless. Suppose that there exist decoding functions 
s" = Q]{fi) and sjYi = Gj+iif^'^'^ , fj, fj+i) such that 
Pr(sj" ^ s") and Pr(sjYi 7^ ^j'+i) both vanish to zero as 
n — > 00. Then 

H{s'^\s]'_^,fj)<ner, (130) 
iJ(sJVi|s;_B_i,f„f,+i) <ne„ (131) 

also hold. 

Proof: To establish ( I130l l we note that for the memoryless 
encoder the following Markov chain holds: 

f5'-^^s;_i^(f„s7). (132) 

Hence we have that 

nen > H{s';\f^) > H{s^\f^-\s]_^, f,) (133) 
= i/(s7|s;_i,f,), (134) 

where the last step follows via ( I132l i. Similarly using 

f^~B-2 ^ ^ {f,-i,s^, fj\ we have 

^His^\s^B-2,fj-ufj)- (135) 

■ 

The conditions in (1 1301 1 and (|131| l show that any rate that 
is achievable in the original problem is also achieved in the 
zig-zag source network. Hence a lower bound to the source 
network also constitutes a lower bound to the original problem. 
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C. Extension to Arbitrary W > 1 

Finally we comment of the extension of the above approach 
loW = 2. We now consider three encoders t G {j, j+2}. 
Encoder t observes a source sources s" and compresses it 
into an index fj G [1,2"^^]. The corresponding decoders are 
revealed s"_2^ for t G {j^j + 1} and the decoder j + 2 is 
revealed s"^ j^. By an argument analogous to Lemma[3]the 
rate region is equivalent to the case when decoders j and 
j + 1 are instead revealed sJYi and s"_^2 respectively. For 
this new setup it is easy to show that decoder j + 2 must 
reconstruct {sf , s'^j^-^, sf^^) given {s]-b-i^ 6 ' 6+2). The 
sum rate must therefore satisfy Rj + ^j+i + Rj+2 > 
■|-ff(sj, Sj+i, Sj-|_2|s,-B-i). Using an extension of Lemma |4] 
we can show that the proposed lower bound also continues to 
hold for the original streaming problem. This completes the 
proof. The extension to any arbitrary > 1 is completely 
analogous. 

IX. Delay Constrained Decoders: Proof of 
Theorem[5] 

A. Achievability 

The achievability of the rate expression ( [34] l is established 
through a Slepian-Wolf coding scheme. A Slepian-Wolf code- 
book is constructed by partitioning the space of all typical 
sequences s" into 2"^ bins and the bin index fi is transmitted 
at time i. The decoder is required to output sf in one of two 
ways. If it has access to s[Li then it finds a sequence jointly 
typical with sf_i in the bin index of ^. This succeeds with 
high probability if i? > H{si\sq) which is clearly satisfied 
in dll. 

If the receiver needs to recover from an erasure burst 
spanning t G {j — i?, . . . , j — 1} it has access to s"_g_^ and 

i-\-T 

needs to use fj to recover s". It simultaneously attempts 
to decode all of s", . . . , s"_^r using fj, . . . , fj+r and s"_g_i. 
This succeeds if (T + 1)R > H{sj, . . . , Sj^xlsj-B-i) which 
in turn holds via (|35] l. 

B. Converse 

The basic idea behind the converse is illustrated in Fig. [12] 
We consider a periodic erasure channel with period p = 
B + T + 1. The fc— th period, for fc > 1, spans the interval 
[(fc-l)(B + r + l) + l,fc(B + r+l)]. In each period the first 
B packets are erased, whereas the remaining T + 1 packets 
fkB+k+{h-i)T, ■ ■ ■ Jk{B+T+i) are not erased. For sake of 
convenience we denote the lower and upper end-points of 
the A;-th interval hy Ik = (fc - 1)(B + T + 1) + 1 and 
Uk = k{B + T + 1). The beginning of the un-erased symbols 
in the fc— th interval is = kB + fc + (fc — 1)T. Furthermore 
for sake of compactness we denote the n— letter sequence s" 
by s i.e., using the bold-face font. 

We provide a heuristic argument that is then formalized 
below. Consider the first period spanning time [1, B + T + 
1]. Recall that the first B channel packets are erased. The 
source sequence Sb+i corresponding to the first un-erasred 
channel packet is recovered at the end of the period i.e., by 
time t = B + T + 1. As soon as this is recovered the decoding 



of the remaining source sequences in [B + 2, B + T + 1] is 
transparent to any previous erasures due to the Markov nature 
of the source. 

Thus for the recovery of sources Sb+2, ■ • ■ , Sb+t+i, the 
relevant erasure burst of length B spans the interval [B + T + 
2, 2B + T + 1]. All these source sequences are recovered by 
their deadline and in particular before the end of the second 
period. 

Thus continuing this argument, if we consider a total of N 
periods then we have a total of N{T + 1) channel packets and 
recover {Se^^, . . • , }i<fe<Ar_i. Thus we have 

N{T+l)nH{f) > H ({Se,, . . . , s„ jfj^^lso) (136) 

= {N - l)nH{sB+i, . . . ,SB+T+i\sa) (137) 

which reduces to (l35T l as we take N ^ 00. 
For the formal converse first observe that, 

> i/(se, , Q , fl^ , . . . , f:: |So) - F(Se, l/,^ , So) (138) 
= His,, , Q , f:- |so) - nsn (139) 

= H{Sei,fei |So) 

+ H{fl\,Jl^ |Se, , /e, , So) - ne„ (140) 

> nH{sB+i\so) 

+ H{f::+^,fl^ |Se, , f^' , So) - n£„ (141) 

where we use the Fano's inequality in (I139l l since s^+i can 
be recovered from (so, fs+i^^) due to the delay constraint of 
T symbols while (1141b follows from the fact that conditioning 
in the second term only reduces entropy. 

We further simpUfy the second term in (I141l i as follows: 

,So) 

— ^ >.=ei + li Jei+1' Je2 ' • • • > Jew Pei ' JO ' '0^ 

-7^(s:^Vll/o^/"/) (142) 

> (s^^Vl' /r,Vl: /e? , • ■ • , |Se, , fo' , So) - (143) 
= -^(Sei+l'/^i + ll^eiiSOi/o') 

+ H{fl^ f^- |so, sll , /o"^) - nsn (144) 

> (s^^Vl |Se, ) + i?(/e? , • • ■ , /eT I^O, S«^ , /^'^ ) - n£„ 

(145) 

> nTH{s,\s^) + H{f:^ f:- |so, s^^^ , f^^ ) - ns^ 

(146) 

where ( I143l l follows from the application of Fano's inequality 
since all the sequences s^^2: ■ • ■ : ^b+t+i recovered by 
time U2 = 2B + 2T + 2 when the B packets in the interval 
[B + T + 2, 2B + T+1] are erased, (fl45T l follows from the fact 
that (so",/o^+i) ^ sB+i ^ (sS+2,...,sS+^+i) and » 
follows because the source sequences are memoryless and 
form a Markov chain. 

Following the same steps as ( 1141b and ( 1146b . we have 

i^(/r^•••,/r„"lso,s^^/o"^) (u?) 

= nH{sB+i\so) 

+ H{f::+,j:i,. . . , iso, s^^^ , se^ , i^n (i48) 

> nH{sB+i\sQ) + nTH{si\sQ) 

+ H{f::^ , . . . , |so, s:i , s:i , /o^^ ) (149) 
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Continuing these steps, we have that 

NiT + l)nR > H{fl- , , . . . , |sj) (150) 
> nNH{sB+i\so) + nT{N - l)i/(si|so) 
+ 7J(fj;V|sr,/o")-"^en (151) 



Dividing by A^(T + l)n and taking n — > 00 and thereafter 
^ cxo we recover i 



X. Conclusions 



We introduce an information theoretic framework to char- 
acterize the fundamental tradeoff between compression effi- 
ciency and error propagation in video streaming systems. We 
introduce the rate-recovery function and develop upper and 
lower bounds on this function. The lower bound is established 
by drawing connection to a periodic erasure channel and a 
multi-terminal source coding problem. We show that for the 
first-order Markov sources the rate-recovery function equals 
the sum of the ideal predictive coding rate and another term 
that decreases as pjT^- For the class of linear deterministic 
Markov sources and i.i.d. Gaussian sources with a sliding- 
window recovery constraint we propose a new coding tech- 
nique — prospicient coding — that achieve the rate-recovery 
function. Numerical results indicate significant gains over 
traditional techniques such as the FEC based schemes. For 
the class of symmetric sources and memoryless encoding the 
optimality of a random binning based scheme is established by 
drawing connection to the Zig-Zag source network problem. 
The optimality of binning is also established when the error 
recovery window is of length zero. 

Several open problems remain in our proposed framework. 
A complete characterization of the rate-recovery function 
remains to be obtained. Better lower bounds can potentially 
be obtained by considering more elaborate schemes rather 
than the binning based technique. Finally extension of this 
framework to lossy reconstructions beyond what has been 
considered in this paper is also a very fruitful area of research. 



Appendix A 
Proof of Corollary [T] 

According to the chain rule of entropies, the term in (|7]i can 
be written as 



H{sb+i,sb+2, ■ ■ ■ , sb+w+i\so) (152) 
w 

H{sb + i\so) + ^ H{sB+k+l\so, Sb + 1, . . . , SB+k) 
k=l 

(153) 



= HisB+i\so) + WH{s,\so) 
= H{sb+i\so) — H{sb+i\sb, So) 

+ H{sB+i\sB,so) + WH{si\so) (154) 

= H{sb + i\so) — H{sb + i\sb, Sf)) 

+ H{sb+i\sb) + WH{s,\so) (155) 
= I{sb+i;sb\so) + {W + l)H{si\so) (156) 
= {W + 1)R+{B,W) (157) 

where (1153b follows from the Markov property 

So, Ss+i, . . . , SB + k-1 — > SB+k — > SB+k+1 (158) 

for any k and from the temporally independency and station- 
arity of the sources which for each k implies that 

H{sB+k+i\sB+k) = H{si\so). (159) 

Note that in (1154b we add and subtract the same term and (1155b 
also follows from the Markov property of (1158b for fc = 0. 

Appendix B 
Proof of Lemma[T] 

First let us define the following notations. 
> For a vector x of size x, define x'"'"' and x''''''' such 
that 



[x — a) 



j.(ti,a) 
Jd.a) 



(160) 



For a matrix X of size xxy, define X^'^"^), X^'^^'^), X^"''') 
and X*'^'''' as 



a (y-a) 



(161) 
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and 



X 



b 

{x-b) 



X(<i,&) 



(162) 



For a square matrix X of size x, define matrices X*^"''"^), 



X 



{x — a) 



a {x—a) 
■y^(dl,a) y^{dr,a) 



(163) 



We introduce an iterative method to define the transformation 

Step 0: If A = or A^i = Nd, the source is in the form 
of (|76] |. Thus Cf{si) — Si. Otherwise, continue to next step. 
Step 1: Without loss of generality we assume that the first 
A^i rows of matrix A are independent]^ Let Ri,o denotes the 
first A^i rows of A and 



where Vi is an [Nd — Ni) x Ni matrix relating dependent 
rows of A to Ri,o- Also define invertible square matrix Mi 
as 



Ml ^ 



Note that 



Ni 



m; 



Ni 

I 

-Vi 



I 

Vi 





I 



Define 



We have 

Sz,l 



Si,l 
Si,l 



(Mis,.d)("^^^) 
(Mis,,,)('^^^i) 



= Mis,d. 



(165) 



(166) 



(167) 



= (MiA MiBM^ 



R-1,0 




^MiSi_i,d 

(MiBMi"i)("''^i) (MiBM^^)("'''^i) 
(MiBMj;i)(''''^i) (MiBM5-i)('^'-^^i) 



(168) 



Nj-Ni 




(Mis,_i„ 
(Mis,_i,d) 

B(i) 



{d,N 




(169) 



(170) 



where A^^) = (MiBM^i)'''''^!' and B^^) 
(MiBMj^^)*'''''^!' and the other matrices are defined 
similarly. Till now i is defined. 
Step 2: Define N2 = Rank(A(i)). Generally 



N2 < min{A^i,iVd - iVi} 



(171) 



*By rearranging the rows of matrices A and B, tliis assumption can always 
be satisfied. 



If N2 = Nd — A^i or if A*^^) is zero matrix, set Si_2 — s,;.i and 
Cf{s,) = I I (172) 




If A^^) ^ and N2 < Nd — Ni, again we assume that the 
first N2 rows of A*^^^ denoted by R2,i contains independent 
rows and 



A^i)^'^-^^) =V2R2,i. 
Also define invertible matrix M2 as 



N2 {Na-Ni-N2) 



M 



2 — 



N2 

{Nd-Ni-N2) 



I 

-V2 



and 



S^A A /(M2Si,i)-'^^ 

K2J ~ UM2Su)'''^^ 



M2S,i 



(173) 



(174) 



(175) 



(164) We have 




Rl,l Rl 2-'^2 ^ 

M2A(i) M2B(i)M2'^ 




(176) 



and ( |176t is equivalent to (|177b which can be written as 

/ Si-l,o\ 





'Ri,o 


Ri,i 


Rl,2 


ii) ^ 





R24 


R2.2 




^ 





A(2) 



Rn 



Si-1,1 
Si-1,2 



(178) 



Note that 2 is defined in this step. 

This procedure can be repeated through next steps until {K — 
l)th step where A^^^^^^ is either full-rank of rank Nk or zero 
matrix. In this step define IIk,k-i = A^^"^) and x — 
Si K-i- The result is 



Cf{si) 



(179) 



Similar to (l66T l and (I171l i. ( fTSl l can be verified for all the steps. 
Note that all the steps are invertible. This completes the proof 
of lemma [2 

Appendix C 
Proof of Lemma|2] 

Consider a source consisting of N^ innovation bits and K 
deterministic sub-symbols Si d defined in ( |76] l. The following 
iterative method characterizes the transformation Cb- 
Step 0: If IIk,k-i — 0, we have 



— -R^+l e 



(180) 
(181) 



Note that s_i, and thus s_i is known at the decoder. 
Therefore, we can eliminate sub-symbol s x and consider 
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Si,l\ 








8^2/ 


V 



R-1,1 
(M2A(i))'*'^^ 



(R'l,2M2 , 



(M2B(i)M^i)"'''^2 
(M2B(i)M^i)'''-^^=^ 



(M2S,_i^l 
V(M2S,_i,i 



)(',JV2) 



(177) 



Si, 2 



/R-1,0 B-i.i 
R2.1 



V 



B-lJ^-2 
R-2./<'-2 



R-l,_ff-l 
R-2,K-1 



R-if-l,A'-2 Rk-1,K-1/ 



Si-l,K-2 



(182) 



the source with iVg innovation bits and deterministic bits 
characterized by (I182l l. at the top of this page, and continue 
to next step with K — 1. Note that knowing s^, can be 
constructed. 

If ^K,K-i is full-rank of rank K, continue to next step. 
Step 1: Define 



Also define 



^Nk-1 





Xi 



Si.K-l 



(183) 





2^3 = ^^J 


Nk-1 






I 








Nk-1 





I 


Xi 


Nk 








I 



and 

Di 

and note that 
Also Xi can be defined such that 




0. 



(184) 



(185) 



(186) 



By these definitions, (|76] | can be reformulated to get (I187I I. 
Matrices R-l^"*-) can be defined accordingly. 
Step J G [2 : K]: Define I ^ K — j. At step j, the source is 
transformed into the form of (1188b. Now define 







Ni 


Ni + 1 ■ 


Nk 




I 











Ni 





I 






N, + i 








I 





Nk 











I 



(189) 



and note that 





/I 








• \ 











I 


-Xi J • • 
















I 







(190) 












• I ) 







§i,i+l 

s,,i^(l Xi,, X2., ••• X,,,) §M+2 

V S,,K j 

By these definitions, (1188b reduces to 



§i,i+l 
§i.i+2 



_ j~)('i'--A^o)g,0-l) 



§1-1, i 
§1-1, i+i 

§i-l,i+2 



(191) 



(192) 



V §»,i^ / V §4, if / 

By defining Xfc,jS such that for each k e {1, 2, . . . , j} 



R 



0-1) 

'(+i,i+fc 



Ri+i,(Xfc,,- — 0, 



(193) 



it is not hard to see that ( |192b can be rewritten as (1 194b whose 
(Z + l)th row is block-diagonalized. 

After these steps, the source is changed into the diag- 
onally correlated Markov source s^, with innovation bits 
Si,o and deterministic bits as (|195b . All the steps are invertible 
and this completes the proof. 



Appendix D 
Generalization to non-integer 

Assume that i?j are rationafl There exists an < a < 1 
such that i?^ = Rjja is integer for each j G {0, 1, . . . , i?}. 
Define n' = an. Each codeword hi ^ G {1, . . . , 2"-'^J } — 
{1, . . . , 2" ^3 } can be represented by n' i.i.d. i?^ -length bit- 

'por irrational rates, consider a rational numbers in the e-neighborhood of 
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^ jy(rd,No) 



/R-1,0 






\ 



/R-1,0 




V 

R-2,1 






R-1,1 

R-2,1 






R-2,a:-2 

^K-l,K-2 




^l,K-2 
R-2,1<'-2 

R-K-1,K- 




r: 



(1) 

2,K- 



r(1) 



r: 



(1) 

2,K 



R 



(1) 



iC-l./f-l 

Rjf.if-i 



R 



Ri,if 

R2,J<' 

Ra'-i.k 
R/f,/f / 



(1) 

K-1,K 

. 



-1,0 

-1,1 



Si-l,_R:-2 

§i-i,/f-i 



Si-1,1 

Si-l,K-2 
§i-l,if-l 
, Si_l,if / 



(187) 



Sj,i+1 
Si,i+2 



Ri 



;-i 



R(,j- 





Ri,; 



R/,; 

R!+l,i 





R 



(i-i) 

i,;+i 



■^;,i+i 

■^i-i-i,/+i 
R;+2,i+i 



R(i-i) 

^1.1+2 



R 

R 



0-1) 

1.1+2 

7+i,;-i-2 




D(i-i) 



R 



1,^-1 



,0-1) 



^;-i-i,if-i 




R 



K,K-\ 



R 



0-1)^ 



R 



0-1) 
-^/+i,if 







/ Si_i \ 



Si-i,;-i 
Sj-i,; 

§i-l,i+l 
Si-l,i+2 



j Si-l.K-l 



Sj,l \ 




''Ri,o • 


• Ri,;-i 


rO) 


rO) 

■"-i,;+i 


rO) 

"'i,;+2 ■ 


rO) 

■^i,ii:-i 


rO) \ 









• Rm-1 


rO) 


rO) 

^/.(+1 


rO) 

-^Z,i+2 


rO) 


rO) 












Ri+i,i 














Si,i-f2 













R;+2,i+i 















\ • 














Rif,if-i 





















( Si-1,0 \ 

S'i-i,;-i 

Si-l,i 
Si-1,(-|-1 
Si-l,i+2 



Si-l,X-l 



Si,d = 







Si,2 




Sj,if-1 











V 





R2,l 









Rif-i,/s:-2 











-1,0 \ 
-1.1 



Si-i,A:-2 

\Si-l,K-lJ 



(195) 
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sequences bj" . Similarly, for j G {1, 2, . . . , B} define 



k=j k=j 
B 



(196) 



(197) 



k=Q k=0 

Now the same coding scheme can be applied to get the rate 

B 

4 (198) 



1 ^ 

nR = n'R'o + — — - V n' R'^ 

k—l 



Appendix E 
Proof of Claim [T] 

We need to lower bound H{f^^^lj^~^\fQ^^^). Consider 
m&'X'-') (200) 

~ -^Vkp+B ' '•(fe+l)p-ll'0 ) 

= ^(^(fe+l)p-ll'o^ ) ~ ^(^(/£+l)p-ll'o''' '^kp+B^ ) 
~^^\'kp+B I'O ' "^(fe+ljp-lJ 

= ^(*'(]c+l)p-l) ~ ^(^(/c+l)p-ll'[f^ ^^fkp+B^ ) 

I T7-/r(fe+l)p-l|ir*:p-l ^ri ^ 
+ -"l'fcp+B ro ''^(fe+l)p-l7 



(202) 

where follows since t("fe+i)p_i = (s^^, . . . , s[5,+i)p_i) is 
independent of f^^^^ as the source sequences s" are generated 
i.i.d. . By expanding tjj,^^s^p_^ we have that 

^(f(fc+l)p_l) = ^(Sfep, • • • iSj^p+B-l) 

+ H^kp+B'- ■ ■ J ■5(fc+i)p-i)i (203) 

and 

uff-n \fkp-l r(fc+l)p-l\ 

'H'-(fc+l)p-ll'0 ^'kp+B ) 

n « \fkp-l Ak+l)p-l^, 

— "ISfep, ■ • ■ , Sj,p_|_3_l|rQ ,lkp+B J + 

"v'kp+B^- ■ ■ '^(fc+l)p-lKo I'i 



kp+B ^^kp^ ■ ■ -^^kp+B-V 



(204) 



We next show the following 

h{Skp, • • ■ , S^p+B-l)^ 

u(^n \fkp-l r(k+l)p~l, 

'H^fcpi ■ • ■ >^fep+B-ll'o ^'kp+B I 



and that 



f^i^kp+Bj ■ ■ ■ 1 ^(fc+l)p-l I'o '' ' ^fcpi ■ • ■ J ^fcp+S-ly 
"''^kp+BT ■ ■ i^(fe+l)p-lKo ^'kp+B ' -^fcp'- • ■ i^fep+S-U 



f7-('r(fc+l)p-l| rfcp-1 .n 
^Vkp+B I'O ' '•(fc+l)p-l 



(206) 



The proof of Claim [U follows from d202T i. ( |203T l, (|204] i, (|205T l 
and ( |206] l. 

To establish ( 12051 ) observe that from the fact that condition- 
ing reduces the differential entropy, 



h[Skp. 



B-l 



I Si., 



fep+S-l^ 



fcp-1 Ak+l)p-l~, 



"l^fepi • • ■ 1 Sfep+S-1 1 'O ' '^fep+B 



'kp+B 



(207) 



We show that for each i = 0, 1,...,_B — 1 

^(^kp+i) ~ ^i^kp+i I '0 



'n ' 'kp+B ~ 



n 
■2 



(208) 



which then establishes ( I205I I. 

Recall that since there is an erasure burst between time 
t G [kp, kp + B — 1] the receiver is required to reconstruct 



'-(fc+i)p-i 



kp+B+W^ ■ 



kp] 



(209) 



with a distortion vector {do, . . . ,dB+w) i-C-, a reconstruc- 
tion of s^p^j is desired with a distortion of dB+w-i 
for i — 0,1, . . . , B + W when the decoder is revealed 
('o'^^\Cb'"')- Hence 

"-y^kp+iJ 'H^fcp+il'o ^'kp+B ) 

= ^i^kp+i) ^ ^i^kp+i\fo ^ 1 ^kp+J'' ' {^kp+i}dB+vv-i) 

(210) 

> H^kp+i) - KSkp+i\{^kp+i}dB + w-,) (211) 
— ^{^kp+i) ^ f^i^kp+i ~ {^p+ijds+w-i) (212) 



Since we have that 

E 



- y^(sfcp 



^kp+i,j J 



J = l 



< d 



(213) 



It follows from standard arguments that EOl Chapter 13] that 



HSkp+^ - {Skp+^}du^n--^) < 3 ^Og2Tre{dB+W -^) ■ (214) 

Substituting (1214b into (I212l l and the fact that ^(s^p^J = 



f log 27re establishes (|208] |. 
It finally remains to establish ( 1206b . 



'kp+B ' ■ 



n \fkp-l n n ^ 

^(fc+l)p-ll'0 7 Sfcpi • ■ • 1 Sfcp+B-ly 



i^f n n \fkp-l f{k+l)p-l n 

"■Vkp+B^- ■ ■ r>(k+l)p-l\'Q ^'kp+B J^fcpv ■ 



^kp+B- 



^Vk 



(fc+l)p_l fcp_l 



kp+B 



' ^(k+l)p 



--lis, 



'kp+BT ■ 



,,S, 



(fe+l)p-l''fep+B 



{k+l)p-l,rkp-l 

'n 



rT(Ak+l)p-l\rkp-l n 
■^Vkp+B I'O ' '•(fe+l)p-l^ 



*pi- • ■ i^fcp+B-l) 

(215) 

H(fkp+B^ %^ ) (216) 

(fc+l)p-l ^„ \fkp+B-l n \ 

^kpr ■ ■ Pkp+B-l) 

(217) 



— ^\'kp+B '^kp+Br ■ ■ P(k+l)p-l\'0 
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The above mutual information term can be bounded as 
follows: 

L / n n irkp+B-1 n n \ 

"l*fep+_Bv ■ • jS(fc+l)p-ll'0 J^fcpv • • j\p+B-l) 

"■y^kp+B^ ■ • • J I'O ' ^fcpj ■ • ■ J ^fep+B-l ; 

= ^i^kp+BT ■ ■ jS(fe+l)p-l) 

'H-^fcp+Bi ■ • • I ^(fc+l)p-l I'O ' ^fcp' • • • ' ^kp+B-l) 

(218) 

^ H^kp+B^ ■ ■ ■ J •S(A:+l)p_l) — 

^{^kp+B^ • ■ • J ^(fc+l)p-ll{^fcp}doJ • ■ • J {^(fc+l)p-l}<io) 

(219) 

^ ^ i^Wkp+B+i) ^ '^(Sfcp+S+i ^ {^fcp+B+j}do) 

^ n 1 , n(W+l) , 1 , 

i— 

where ( |218t follows from the independence of 
i^kp+B^- ■ ■ ^^{k+i)p-i) "^he past sequences, and ( |219t 

follows from the fact that given the entire past /q''^^^^^^ 
each source sub-sequence needs to be reconstructed with a 
distortion of do and the last step follows from the standard 
approach in the proof of the rate-distortion theorem. This 
establishes ( 1206b . 
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